[mlir][xegpu] Add OptimizeBlockLoads pass. #165483

charithaintc · 2025-10-28T21:47:39Z

This pass rewrites certain xegpu CreateNd and LoadNd operations that feeds into vector.transpose to more optimal form to improve performance. Specifically, low precision (bitwidth < 32) LoadNd ops that feeds into transpose ops are rewritten to i32 loads with a valid transpose layout such that later passes can use the load with transpose HW feature to accelerate such load ops.

Update:
Pass is renamed to OptimizeBlockLoads because later we plan to add the array length optimization into this pass as well. This will break down a larger load (like 32x32xf16) into more DPAS-favorable array length loads (32x16xf16 with array length = 2). Both these optmizations require rewriting CreateNd and LoadNd and it makes sense to have a common pass for both.

akroviakov

Some preliminary comments

mlir/lib/Dialect/XeGPU/Transforms/XeGPUOptimizeTranspose.cpp

mlir/lib/Dialect/XeGPU/Transforms/XeGPUOptimizeBlockLoads.cpp

mlir/lib/Dialect/XeGPU/Transforms/XeGPUOptimizeTranspose.cpp

mlir/lib/Dialect/XeGPU/Transforms/XeGPUOptimizeBlockLoads.cpp

Jianhui-Li · 2025-11-01T02:00:16Z

mlir/test/Dialect/XeGPU/optimize-transpose.mlir

+  %4:4 = scf.for %arg3 = %c0 to %c256 step %c32 iter_args(%arg4 = %1, %arg5 = %1, %arg6 = %1, %arg7 = %1)
+    -> (vector<8x16xf32>, vector<8x16xf32>, vector<8x16xf32>, vector<8x16xf32>) {
+    %6 = xegpu.load_nd %3[%c0, %arg3]  { layout_result_0 = #b }
+      : !xegpu.tensor_desc<32x16xf16, #b, #xegpu.block_tdesc_attr<array_length = 2 : i64>> -> vector<2x32x16xf16>


the #b layout is 2d but the result vector is 3d. I think we should consider just using 2d (32x32 instead of 2x32x16) for block array load.

this needs some changes to XeGPU op verifications and some existing lowering passes. So I would like to keep this test to show the current capability of this pass.
I will do the cleanup for 3D vector representation in another PR.

Jianhui-Li · 2025-11-01T02:14:26Z

mlir/lib/Dialect/XeGPU/Transforms/XeGPUOptimizeBlockLoads.cpp

+    // converted.
+    target.addDynamicallyLegalOp<xegpu::CreateNdDescOp>(
+        [&](xegpu::CreateNdDescOp createNdOp) {
+          return !canBeOptimized(createNdOp.getType());


Consider using name like "canLoadTransposed"

renamed to canBeCanonicalizedForTranspose. I don't want to mention any specific op (like Load) in the name because we are modifying an op sequence here.

Jianhui-Li · 2025-11-01T02:17:47Z

mlir/lib/Dialect/XeGPU/Transforms/XeGPUOptimizeBlockLoads.cpp

+    VectorType origVectorType =
+        VectorType::get(origDataShape, adaptorType.getElementType());
+    Value data;
+    // Orig data shape is 3D for the array length case.


I think 2d works better for array block load result

agreed. This will be cleaned up. I will create an issue to track this.

Jianhui-Li

LGTM

Jianhui-Li · 2025-11-04T00:16:01Z

mlir/lib/Dialect/XeGPU/Transforms/XeGPUOptimizeBlockLoads.cpp

+          rewriter, loc,
+          VectorType::get(supportedShape, data.getType().getElementType()),
+          newTensorDesc, ArrayRef<OpFoldResult>{loadOffsetX, loadOffsetY},
+          origLoadOp.getPackedAttr(), origLoadOp.getTransposeAttr(),


Nit: Do we still need to set Pack and Transpose attributes? I think these info are associated with the layout, and we only set them at the WI level when the lane level layout is dropped.

you are right. this is only for copying existing attributes from the f16 load to i32 load. it has no impact.

adam-smnk · 2025-11-04T21:31:54Z

Just a fly-by comment, not adding the new pass to the pipeline?

charithaintc · 2025-11-04T22:25:35Z

Just a fly-by comment, not adding the new pass to the pipeline?

ah sorry. I was unaware of this requirement. I think I will add it once this is e2e tested properly downstream.

llvm-ci · 2025-11-05T04:31:07Z

LLVM Buildbot has detected a new failure on builder ppc64le-mlir-rhel-clang running on ppc64le-mlir-rhel-test while building mlir at step 6 "test-build-check-mlir-build-only-check-mlir".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/129/builds/32450

Here is the relevant piece of the build log for the reference

Step 6 (test-build-check-mlir-build-only-check-mlir) failure: 1200 seconds without output running [b'ninja', b'check-mlir'], attempting to kill
...
PASS: MLIR-Unit :: Interfaces/./MLIRInterfacesTests/12/22 (3624 of 3634)
PASS: MLIR-Unit :: Pass/./MLIRPassTests/10/13 (3625 of 3634)
PASS: MLIR-Unit :: Interfaces/./MLIRInterfacesTests/11/22 (3626 of 3634)
PASS: MLIR-Unit :: IR/./MLIRIRTests/0/130 (3627 of 3634)
PASS: MLIR :: mlir-reduce/dce-test.mlir (3628 of 3634)
PASS: MLIR :: mlir-tblgen/cpp-class-comments.td (3629 of 3634)
PASS: MLIR :: mlir-reduce/crashop-reduction.mlir (3630 of 3634)
PASS: MLIR :: mlir-reduce/multiple-function.mlir (3631 of 3634)
PASS: MLIR :: mlir-tblgen/op-error.td (3632 of 3634)
PASS: MLIR :: Pass/pipeline-options-parsing.mlir (3633 of 3634)
command timed out: 1200 seconds without output running [b'ninja', b'check-mlir'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=1998.228646

adam-smnk · 2025-11-05T08:19:15Z

Just a fly-by comment, not adding the new pass to the pipeline?

ah sorry. I was unaware of this requirement. I think I will add it once this is e2e tested properly downstream.

Not a requirement per se, just curious 😀

Ideally, the pipeline should be kept up to date when possible. But no worry if a new pass/transform needs more testing first (standalone or within the pipeline) or might not be universally applicable.

charithaintc added 11 commits October 21, 2025 18:49

add pass

9d0341d

save work

76f7323

add some tests

43c35be

Merge branch 'main' into optimize_transpose

cf23eaf

add some tests

f79d2a2

Merge branch 'main' into optimize_transpose

e9211c8

save work

ca5d902

working version

35ca92b

Merge branch 'main' into optimize_transpose

9fcbe03

Merge branch 'main' into optimize_transpose

44e6ac4

add tests

17fd7c8

charithaintc requested review from Jianhui-Li, adam-smnk, akroviakov and silee2 October 28, 2025 21:47

charithaintc added 2 commits October 29, 2025 00:12

add comments

cbcccf6

add comments

b55f6b0

akroviakov reviewed Oct 29, 2025

View reviewed changes

Merge branch 'main' into optimize_transpose

f424297

akroviakov reviewed Oct 30, 2025

View reviewed changes

mlir/lib/Dialect/XeGPU/Transforms/XeGPUOptimizeBlockLoads.cpp Outdated Show resolved Hide resolved

akroviakov reviewed Oct 30, 2025

View reviewed changes

mlir/lib/Dialect/XeGPU/Transforms/XeGPUOptimizeBlockLoads.cpp Outdated Show resolved Hide resolved

akroviakov reviewed Oct 30, 2025

View reviewed changes

mlir/lib/Dialect/XeGPU/Transforms/XeGPUOptimizeBlockLoads.cpp Outdated Show resolved Hide resolved

akroviakov reviewed Oct 30, 2025

View reviewed changes

mlir/lib/Dialect/XeGPU/Transforms/XeGPUOptimizeBlockLoads.cpp Outdated Show resolved Hide resolved

charithaintc added 2 commits October 31, 2025 15:47

Merge branch 'main' into optimize_transpose

0508bde

Merge branch 'main' into optimize_transpose

3d829c9

Jianhui-Li reviewed Nov 1, 2025

View reviewed changes

charithaintc added 4 commits November 3, 2025 19:03

use uArch

51e84ab

Merge branch 'main' into optimize_transpose

bd92296

change pass name

9b694ad

address comments

4f00ec4

charithaintc added 2 commits November 3, 2025 21:48

address comments

60ec9f5

address comments

c8590bb

charithaintc changed the title ~~[mlir][xegpu] Add OptimizeTranspose pass.~~ [mlir][xegpu] Add OptimizeBlockLoads pass. Nov 3, 2025

charithaintc added 4 commits November 3, 2025 22:07

address comments

1af68c7

remove unused headers

22e25a9

Merge branch 'main' into optimize_transpose

0c96d3e

fix comment

f70c07c

Jianhui-Li approved these changes Nov 4, 2025

View reviewed changes

akroviakov approved these changes Nov 4, 2025

View reviewed changes

charithaintc and others added 4 commits November 4, 2025 17:20

Merge branch 'main' into optimize_transpose

d66e04e

fix comment

51f4c4b

Merge branch 'main' into optimize_transpose

25277bb

Merge branch 'main' into optimize_transpose

0c9cee9

charithaintc merged commit 9703bda into llvm:main Nov 4, 2025
8 of 9 checks passed

kerbowa mentioned this pull request Nov 10, 2025

[AMDGPU] Verify dominance when rewriting spills to registers #167347

Open

[mlir][xegpu] Add OptimizeBlockLoads pass. #165483

[mlir][xegpu] Add OptimizeBlockLoads pass. #165483

Uh oh!

Conversation

charithaintc commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

akroviakov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jianhui-Li Nov 1, 2025

Choose a reason for hiding this comment

Uh oh!

charithaintc Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Jianhui-Li Nov 1, 2025

Choose a reason for hiding this comment

Uh oh!

charithaintc Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jianhui-Li Nov 1, 2025

Choose a reason for hiding this comment

Uh oh!

charithaintc Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Jianhui-Li left a comment

Choose a reason for hiding this comment

Uh oh!

Jianhui-Li Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

charithaintc Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

adam-smnk commented Nov 4, 2025

Uh oh!

charithaintc commented Nov 4, 2025

Uh oh!

llvm-ci commented Nov 5, 2025

Uh oh!

adam-smnk commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

charithaintc commented Oct 28, 2025 •

edited

Loading

charithaintc Nov 3, 2025 •

edited

Loading